Object Recognition For Security Screening and Long Range Video Surveillance

ABSTRACT

A method of detecting an object in image data that is deemed to be a threat includes annotating sections of at least one training image to indicate whether each section is a component of the object, encoding a pattern grammar describing the object using a plurality of first order logic based predicate rules, training distinct component detectors to each identify a corresponding one of the components based on the annotated training images, processing image data with the component detectors to identify at least one of the components, and executing the rules to detect the object based on the identified components.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Application No.61/443,448 filed on Feb. 16, 2011, and U.S. Provisional Application No.61/443,296 filed on Feb. 16, 2011, the disclosure of each isincorporated by reference herein.

BACKGROUND OF THE INVENTION

1. Technical Field

The present disclosure relates generally to computer vision, and moreparticularly, to security screening and long range video surveillanceusing computer vision.

2. Discussion of Related Art

Security screening systems inspect checked and hand baggage, cargo,containers, passengers, etc. for content, such as, explosives,improvised explosive devices (IEDs), firearms, contraband, drugs, etc.They play a key role in the Homeland Defense/Security strategy forincreased safety in airports, air and sea traffic. For instance, sinceAugust 2010 the government has mandated 100% air cargo screening, withpossible extension to sea cargo. State-of-the-art security screeningsystems require improvement in a number of aspects. This includes (a)efficient and effective automation for improved throughput and focusedoperator attention and (b) a systems view and integration of variouscomponents in screening, e.g., reconstruction, segmentation, detection,recognition, visualization, standards, platform, etc., to achieve anefficient screening workflow.

A current system for security screening involves two stages. In a first,automated, stage, X-Ray, CT, etc. scan data is obtained and imagereconstruction is performed. Resulting images often encode materialproperties, such as, density or effective atomic number Z_(eff). Then,pixels or voxels of suspicious density and Z_(eff) are identified, andcontiguous regions segmented. Statistics of suspicious regions (e.g.,mass, volume, etc.) are computed and compared to critical thresholds. Ina second stage, identified suspicious regions are manually verified foroccurrence of a threat by the human operator. This strategy is employedin many screening systems developed by various scanner vendors. However,these systems require a large amount of operator supervision, due to thelarge number of false alarms.

Further, there is an increasing need for fast extraction and review,from real-time and archived surveillance video, of activities involvinghumans, vehicles, packages or boats. This need has been driven by therapid expansion of video camera network installations worldwide inresponse to enhanced site security and safety requirements. The amountof data acquired by such video surveillance devices today far exceedsthe operator's capacity to understand its contents and meaningfullysearch through it. This represents a fundamental bottleneck in thesecurity and safety infrastructure and has prevented video surveillancetechnology from reaching its full potential.

Automated video analytics modules operating over video surveillancesystems provide one means of addressing this problem, by analyzing thecontents of the video feed and generating a description of interestingevents transpiring in the scene. However, these modules are inadequateto robustly detect human and vehicular activities in video.

SUMMARY OF THE INVENTION

According to an exemplary embodiment of the invention, a method ofdetecting an object in image data that is deemed to be a threat includesannotating sections of at least one training image to indicate whethereach section is a component of the object, encoding a pattern grammardescribing the object using a plurality of first order logic basedpredicate rules, training distinct component detectors to each identifya corresponding one of the components based on the annotated trainingimages, processing image data with the component detectors to identifyat least one of the components, and executing the rules to detect theobject based on the identified components. The pattern grammar may beimplemented as instructions in a processor, where executing of the rulesis performed by the processor executing the instructions.

The image data may be output by a security screening device. In at leastone embodiment, the training is performed using Adaptive Boosting.

In an embodiment, the threatening object is a knife where the annotatedsections indicate whether each component is one of a handle, a guard, ora blade of the knife.

In an embodiment, the threatening object is a gun where the annotatedsections indicate whether each component is one of a lock, a stock, or abarrel of the gun.

In an embodiment, the object is a detonator and the annotated sectionsindicate whether each component is one of a tube and an explosivematerial.

In an embodiment, the object is a bomb and the annotated sectionsindicate whether each component is one of a detonator, explosivematerial, a cable, and a battery.

The image data may be X-ray image data. The image data may be computedtomography (CT) image data.

In an embodiment, training includes determining uncertainty values foreach of the rules, converting the rules into a knowledge-basedartificial neural network, where each uncertainty value corresponds to aweight of a link in the neural network, and using a back-propagationalgorithm modified to allow local gradients over a bilattice specificinference operation to optimize the link weights.

In an embodiment, the pattern grammar describes a visual pattern of thethreatening object by encoding knowledge about contextual clues, scenegeometry, and visual pattern constraints.

In an embodiment, the training of a corresponding one of the componentdetectors includes performing a physics-based perturbation on one of theannotated training images to generate a new annotated training image andtraining the distinct component detectors based on the annotatedtraining images and the new annotated training image.

The perturbation may be a geometric transformation. The performing ofthe perturbation may include adding another object to be superimposedwith a component in the training image to generate the new annotatedtraining image.

According to an exemplary embodiment of the invention, a method oftraining a threat detector to detect an object in image data that isdeemed to be a threat includes defining a pattern grammar to describe avisual pattern that is representative of the object, encoding thepattern grammar using a plurality of first order predicate based logicrules, and dividing an object into component parts, training distinctcomponent detectors to each detect a corresponding one of the componentparts, and generating the threat detector from the rules.

According to an exemplary embodiment of the invention, a method ofdetecting an activity in video data includes annotating sections of atleast one training video to indicate whether each section is a componentof the activity, encoding a pattern grammar describing the object usinga plurality of first order logic based predicate rules, trainingdistinct component detectors to each identify a corresponding one of thecomponents based on the annotated training videos, processing video datawith the component detectors to identify at least one of the components,and executing the rules to detect the activity based on the identifiedcomponents.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the invention can be understood in more detailfrom the following descriptions taken in conjunction with theaccompanying drawings in which:

FIG. 1 illustrates a method of detecting an object deemed to be a threataccording to an exemplary embodiment of the invention.

FIGS. 2A and 2B illustrate exemplary components of a threatening objectthat may be detected by application of the method.

FIG. 3 illustrates exemplary component detector or classifiers that maybe used to detect the components.

FIG. 4 illustrates additional classifiers that may be used to detect thethreatening object using outputs of the component detectors of FIG. 3.

FIG. 5 illustrates exemplary training images that may be created byperforming perturbations on training data used to train the componentdetectors.

FIGS. 6 a, 6 b, 6 c, 6 d, 6 e and 6 f, shows examples of differentbilattices and the types of logic they can be used to model.

FIG. 7 is an example of a bilatice square.

FIG. 8 is an example of rules of a pattern grammar that may be used withthe above method.

FIGS. 9 and 10 show examples of artificial neural networks.

FIG. 11 illustrates a method that may be performed during training ofthe component detectors according to an exemplary embodiment of theinvention.

FIG. 12 illustrates an example of a computer system capable ofimplementing methods and systems according to embodiments of the presentinvention.

DETAILED DESCRIPTION

Exemplary embodiments of the invention are discussed in further detailwith reference to FIGS. 1-12. This invention may, however, be embodiedin different forms and should not be construed as limited to theembodiments set forth herein.

It is to be understood that the systems and methods described herein maybe implemented in various forms of hardware, software, firmware, specialpurpose processors, or a combination thereof. In particular, at least aportion of the present invention may be implemented as an applicationcomprising program instructions that are tangibly embodied on one ormore program storage devices (e.g., hard disk, magnetic floppy disk,RAM, ROM, CD ROM, etc.) and executable by any device or machinecomprising suitable architecture, such as a general purpose digitalcomputer having a processor, memory, and input/output interfaces. It isto be further understood that, because some of the constituent systemcomponents and process steps depicted in the accompanying Figures may beimplemented in software, the connections between system modules (or thelogic flow of method steps) may differ depending upon the manner inwhich the present invention is programmed. Given the teachings herein,one of ordinary skill in the related art will be able to contemplatethese and similar implementations of the present invention.

At least one embodiment of the invention was made in effort to makedetection of threatening objects (e.g., guns, knives, explosives, etc.)easier even if they are disguised or separate into their constituentcomponent parts. For example, terrorists use both manufactured andimproved firearms. Firearms are even available from some manufacturersdisguised as walking-sticks and guns may be manufactured to look likepens, key rings, and many other day-to-day items. Criminals willsometimes saw-off the barrel and butt of a firearm to help make itshorter or more concealable. Handguns not only differ in general shapeand size, but can also be constructed from metals, plastics, compositematerials, and wood butts or grips. Further, a terrorist may partiallydismantle a gun to make the components less recognizable as a firearm.Typically guns will be concealed by terrorists within electrical items;within suitcase linings, to take advantage of metal reinforcing stripsand suitcase frames; behind very dense items; or placed at acute angles.A threatening object, such as a firearm can be viewed not as a “gun”,but as a set of parts (e.g., “locks, stock, and barrel”).

FIG. 1 illustrates a method of detecting an object deemed to be a threataccording to an exemplary embodiment of the invention. The methodincludes annotating sections of training image data to identify sectionsof the image as being at least one of the constituent components of theobject (S101).

The annotating step may be skipped if a sufficient repository ofannotated training images is already available. The training image datamay come from various modalities such as 2D X-ray, 3D ComputedTomography (CT), a millimeter wave scan, backscatter X-ray, etc.

The annotating may be performed manually by a user with experience indetecting the individual components. As shown in FIG. 2A, when theobject is a gun 200, the user can annotate (label) the left part 201 asbeing the barrel, the middle part 202 as being the lock, and the rightpart 203 as being the stock. As shown in FIG. 2B, when the object is aknife 210, the user can annotate (label) the lower part 211 as theblade, the upper left part 212 as the guard, and the upper right part213 as the handle. Although not shown in the figures, in otherembodiments, the object can be a detonator or a bomb. When the object isa detonator, user can annotate one part as being a tube and another partas being an explosive. In an embodiment, the tube is aluminum and theexplosive is lead azide. When the object is a bomb, the user canannotate one part as being the detonator, another part as being theexplosive, another part as being a cable (e.g., an electrical cable),and another part as being a battery.

However, embodiments of the invention are not limited the componentsparts mentioned above or illustrated in FIG. 2A and FIG. 2B. Forexample, a gun or a knife may include additional parts, and the partsshown may be divided further into sub-parts, for respective annotation.Further, while the disclosure is discussed primarily with respect toknives, guns, and explosive devices, the invention is not limited to anyparticular type of threatening object. For example, the threateningobject can include illicit drug paraphernalia, or other items that aretypically disallowed during transport (e.g., air, rail, sea, etc.) suchas box cutters, ice picks, scissors, bats, bows, arrows, grenades,screwdriver, hammer, etc. Further, guns as discussed above may includevarious firearms such as rifles, shotguns, machineguns, etc.

The annotating may consist of marking the component parts with seedpoints, landmark points, or drawing an outline around each componentpart. For example, an outline or seed point of a particular color couldbe used distinguish one component part of the object from another.

Referring back to FIG. 1, the method includes a step of encoding apattern grammar describing the object using first order logic basedpredicate rules (S102). The grammar and the rules will be discussed inmore detail below. The method next includes training component detectorsto each detect a corresponding one of the components parts from theannotated training image data (S103). In at least one embodiment, thetraining is performed using Adaptive Boosting (AdaBoost).

For example, as shown in FIG. 2A, when the threatening object is a knifeand it is broken into component parts such as a blade, guard, and ahandle, the training generates a blade classifier 201, a guardclassifier 202, and a handle classifier 203. Had the knife been brokeninto additional component parts due to the annotation of the trainingdata, additional corresponding classifiers would have been generated.The classifiers may be configured to output binary data (e.g., it's ablade, it's not a blade, it's a guard, it's not a guard, etc.) and aconfidence value indicating the level of confidence in theclassification.

One or more components parts may be distinctive of a particular type orsub-type of threatening object. For example, one blade may bedistinctive of not only a knife, but more particularly of particulartype of knife, such as a bowie knife, a butterfly knife, etc. Thus, thecomponents in the training data can be further marked with the type orsub-type. Thus, referring to FIG. 4, classifiers such as a Bowieclassifier 401 and a butterfly classifier 402 would be generated, inaddition to the overall knife classifier 403. When the sub-typeinformation is unknown, classifiers 401 and 402 can be omitted, and thenthe output of classifies 301-303 would go directly to the knifeclassifier 403. In this case, the component 301-303 classifiers receivelow level input features and the knife classifier 403 receives componentfeatures (e.g., features of blades, guards, etc.) from the componentclassifiers. Each of the classifiers uses the rules of the patterngrammar to interpret features input to them against the training data tomake their respective classifications.

However, the amount of training image data available may be relativelysmall. Thus, in at least one embodiment of the invention, the availabletraining data is augmented with various perturbation (e.g.,physics-based) models with respect to well defined statistics. Aperturbation is any one of various geometric transformations that can beperformed on the entire object or part of the object. For example, apart of an edge of an object can be rotated, lengthened, shortened, orthe width of an area of an object can be increased or decreased. Inanother example, the entire object could be enlarged, shrunk, rotated,etc. For example, the length of the stock of a shot gun can be shortenedto simulate a sawed off shot-gun or the barrel of a gun can belengthened to simulate different versions of a gun.

FIG. 5 illustrates examples 501->502 and 511->512 of differentperturbations, such as geometric transformations being performed onexisting training data to generate additional training data. As shown inFIG. 5 these perturbations may include adding artifacts or noise to anexisting training image. The additional artifacts can be random objects,noise, or known objects. For example, additional artifacts can bemodeled from items typically found in a suitcase, but which withotherwise obscure or interfere with the view provided by the scanner(e.g., a metal pocket watch, a paper clip, etc.). The additional objectsadded to training data overlap or superimpose at least one component(e.g., the blade) of the threatening object. Based on the type ofobject, the overlapping may yield different results. For example, ifadded the object is a material that completely blocks X-rays such aslead, the overlapping portion will appear very dark. If the added objectis some other material that does not completely block X-rays, then theoverlapping portion may appear somewhat darker or it may be unaffectedif X-rays are not obstructed in any way. Due to the above-describedperturbations (modifications), additional synthetic images are added tothe training data as examples of the threatening object to increase thedetection accuracy of a component detector.

Referring back to FIG. 1, the method includes processing image data withthe trained component detectors to identify the components of athreatening object (S104). For example, the image data may come fromvarious modalities such as 2D X-ray, 3D Computed Tomography (CT), amillimeter wave scan, backscatter X-ray, etc. For example, assume thatthe blade classifier 301, guard classifier 302, and handle classifier303 of FIG. 3 correspond to the trained component detectors and theprocessing runs each classifier against the image data or againstfeatures generated from low level detectors that process the image data.The classifiers/detectors of FIG. 3 may be referred to as data drivenclassifiers, which may trained for various views. There may beadditional classifiers present that identify various shape features(e.g., ridge-like features of the blade) of the threatening object. Theshape features and the component based features (e.g., is a blade/is nota blade) can be fed to the knife classifier 403 for classification.

Next, the method of FIG. 1 includes executing the rules to detect thethreatening object based on the components identified by the trainedcomponent detectors (S105).

In at least one embodiment, the rules can be modeled using bilatticeformalism. FIG. 6, which contains FIGS. 6 a, 6 b, 6 c, 6 d, 6 e and 6 f,shows examples of different bilattices and the types of logic they canbe used to model. FIG. 6( a) for instance, models classical two valuedlogic, FIG. 6( b) models three valued logics, FIG. 6( c) models Belnap'sfour valued logics, FIGS. 6( d) and (e) model traditional andprioritized default logics, and FIG. 6( f) models continuous valuedlogics.

Further to FIG. 6, the choice of different lattices that compose thebilattice give rise to different logics as shown in the figures FIG. 6(a) bilattice for two valued logics (trivial bilattice) with only trueand false nodes, FIG. 6( b) bilattice for three valued logic withadditional node for unknown FIG. 6( c) bilattice for four valued logicswith additional node for contradiction FIG. 6( d) bilattice for defaultlogics, FIG. 6( e) bilattice for prioritized default logics and FIG. 6(e) bilattice for continuous valued logic.

In at least one embodiment, a reasoning system is looked upon as apassive rational agent capable of reasoning under uncertainty.Uncertainties assigned to the rules that guide reasoning, as well asdetection uncertainties reported by the low level detectors, are takenfrom a set structured as a bilattice. These uncertainty measures areordered along two axes, one along the source's degree of information andthe other along the agent's degree of belief. A single rule applied toits set of corresponding facts is referred to as a source here. Therecan be multiple rules deriving the same proposition (both positive andnegative forms of it) and therefore we have multiple sources ofinformation.

A lattice is a set L equipped with a partial ordering ≦ over itselements, a greatest lower bound (glb) and a lowest upper bound (lub)and is denoted as

=(L, ≦) where glb and lub are operations from L×L→L that are idempotent,commutative and associative. Such a lattice is said to be complete, ifffor every nonempty subset M of L, there exists a unique lub and glb.

A bilattice is a triple B=(B, ≦t, ≦k), where B is a nonempty setcontaining at least two elements and (B, ≦t), (B, ≦k) are completelattices. Informally a bilattice is a set B of uncertainty measurescomposed of two complete lattices (B, ≦t) and (B, ≦k) each of which isassociated with a partial order ≦t and ≦k respectively. The ≦t partialorder (agent's degree of belief) indicates how true or false aparticular value is, with f being the minimal and t being the maximalwhile the ≦k partial order indicates how much is known about aparticular proposition. The minimal element here is ⊥ (completelyunknown) while the maximal element is □ (representing a contradictorystate of knowledge where a proposition is both true and false). The glband the lub operators on the ≦t partial order are

and

and correspond to the usual logical notions of conjunction anddisjunction, respectively. The glb and the lub operators on the ≦kpartial order are

and ⊕, respectively, where ⊕ corresponds to the combination of evidencefrom different sources or lines of reasoning while

corresponds to the consensus operator. A bilattice is also equipped witha negation operator that inverts the sense of the ≦t partial order whileleaving the ≦k partial order intact and a conflation operator—whichinverts the sense of the ≦k partial order while leaving the ≦t partialorder intact.

The intuition is that every piece of knowledge, be it a rule or anobservation from the real world, provides different degrees ofinformation. An agent that has to reason about the state of the worldbased on this input, will have to translate the source's degree ofinformation, to its own degree of belief. Ideally, the more informationa source provides, the more strongly an agent is likely to believe it(e.g., closer to the extremities of the t-axis). The only exception tothis rule is when contradictory information is present. For example,when two sources contradict each other, it will cause the agent's degreeof belief to decrease despite the increase in information content. It isthis decoupling of the sources and the ability of the agent to reasonindependently along the truth axis that helps us address the issuesraised in the previous section. It is noted that the line joining ⊥ and□ represents the line of indifference. If the final uncertainty valueassociated with a hypothesis lies along this line, it means that the“degree of belief for” and “degree of belief against” cancel each otherout and the agent cannot say whether the hypothesis is true or false.Ideally the final uncertainty values should be either f or t, but noisein observations as well as less than completely reliable rules typicallyprevents such. The horizontal line joining t and f is the line ofconsistency. For any point along this line, the “degree of belief for”will be equal to “(1-degree of belief against)” and thus the finalanswer will be consistent.

A rectangular bilattice is a structure Λ⊙P=(L×R, ≦t, ≦k), where forevery x₁,x_(2ε)Λ and y₁,y₂εP, (x₁, y₁)≦, (x₂, y₂)

x₁≦Lx₂ and y₁≧Ry₂, and (x₁, y₁)≦k(x₂, y₂)

x₁≦Lx₂ and y₁≧Ry₂. An element (x₁,y₁) of the rectangular bilattice Λ⊙Pmay be interpreted such that x₁ represents the amount of belief for someassertion while y₁ represents the amount of belief against it. If onedenotes the glb and lub operations of complete lattices Λ=(L, ≦_(L)) andP=(R, ≦_(R)) by

_(L) and

_(L), respectively, one can define the glb and lub operations along eachaxis of the bilattice Λ⊙P as follows

(x ₁ ,y ₁)

(x ₂ ,y ₂)

(x ₁

_(L) x ₂ ,y ₁

_(R) y ₂),

(x ₁ ,y ₁)

(x ₂ ,y ₂)

(x ₁

_(L) x ₂ ,y ₁

_(R) y ₂),

(x ₁ ,y ₁)

(x ₂ ,y ₂)

(x ₁

_(L) x ₂ ,y ₁

_(R) y ₂),

(x ₁ ,y ₁)⊕(x ₂ ,y ₂)

(x ₁

_(L) x ₂ ,y ₁

_(R) y ₂).  (1)

Of interest to embodiments of the invention is a particular class ofrectangular bilattices where Λ and P coincide. These structures arecalled squares and Λ⊙Λ is abbreviated as Λ². Since detection likelihoodsreported by the low level detectors are typically normalized to lie inthe [0,1] interval, the underlying lattice that one is interested in isA=([0,1], ≦). The bilattice that is formed by Λ² is depicted in FIG. 7with the bilattice square B=([0,1]², ≦t, ≦k). Every element of thisbilattice is of the form evidence_for or, evidence_against. Note thatwith this choice of the lattice, becomes a complete ordering, meaningall members of the lattice are comparable. The definition of arectangular bilattice is modified such that <x₁,y₁>≦t<x₂,y₂>

x₁−y₁≦x₂−y₂ and <x₁,y₁>≦k<x₂,y₂>

>x₁₊y₁≦x₂₊y₂. Each element in this bilattice is a tuple with the firstelement encoding evidence for a proposition and the second encodingevidence against. In this bilattice, the element f (false) is denoted bythe element <0,1> indicating, no evidence for but full evidence against,similarly element t is denoted by <0,1> element ⊥ by <0,0> indicating noinformation at all and □ is denoted by <1,1>. To fully define glb andlub operators along both the axes of the bilattice as listed in equation(1), one needs to define the glb and lub operators for the underlyinglattice ([0,1], A popular choice for such operators are triangular-normsand triangular-conorms. The triangular norm may be used to model the glboperator and the triangular conorm may be used to model the lub operatorwithin each lattice.

A mapping J:[0,1]x[0,1]→[0,1] is a triangular norm (t-norm) iff Jsatisfies properties: Symmetry: J(a,b)=J(b,a), ∀ a,bE [0,1],Associativity: J(a,J(b,c)=J(J(a,b),c), ∀ abcε [0,1], Monotonicity:J(a,b)≦J(a′,b′) if a≦a′ and b≦b′, and One identity: J(a,1)=a, ∀ aε[0,1]. A mapping S:[0,1]×[0,1]→[0,1] is a triangular conorm (t-cornorm)iff S satisfies properties: Symmetry: S(a,b).S(b,a), ∀ a,bε [0,1],Associativity: S(a,S(b,c)=S(S(a,b),c), ∀ abcε [0,1], Monotonicity:S(a,b)≦S(a′,b′) if a≦a′ and b≦b′, and Zero identity: S(a,0)=a, ∀ aε[0,1].

If J is a t-norm, then the equality S(a,b)=1−J(1−a,1−b) defines at-conorm and one says that S is derived from J. There are number ofpossible t-norms and t-conorms one can choose. In an embodiment, for theunderlying lattice, L=([0,1], ≦) one chooses the t-norm such thatJ(a,b)=a

b=ab and consequently chooses the t-conorm as S(a,b)=aV_(L)b=a+b−ab.Based on this, the glb and lub operators for each axis of the bilatticeB can then be defined as per equation (1).

Inference in bilattice based reasoning frameworks is performed bycomputing the closure over the truth assignment. Given a declarativelanguage L, a truth assignment is a function φ: L→B where B is abilattice on truth values or uncertainty measures. Let K be theknowledge base and φ be a truth assignment, labeling every formula kεK,then the closure over kεK, denoted cl(φ) is the truth assignment thatlabels information entailed by K. For example, if φ labels sentences{p,q←p}εK as <1,0> (true); i.e. φ(p)=<1,0> and φ(q←p)=1,0>, then cl(φ)should also label q as <1,0> as it is information entailed by K.Entailment is denoted by the symbol ‘□’(K□q).

Let S_(q) ^(+⊂)L be the collection of minimal subsets of sentences in Kentailing q. For each UεS_(q) ⁺, the uncertainty measure to be assignedto the conjunction of elements of U is the term

p{circumflex over (ε)}Ucl(φ)(p)  (2),

which represents the conjunction of the closure of the elements of U.Recall that

and

are glb and lub operators along the t ordering and

and ⊕ are operators along ≦k axis. The symbols

⊕ are their infinite counterparts such that ⊕_(pε,S)p=p₁⊕p₂⊕, . . . ,etc. It is important to note that this term is merely a contribution tothe final uncertainty measure of q and not the final uncertainty measureitself. The reason it is merely a contribution is because there could beother sets of sentences in S_(q) that entail q representing differentlines of reasoning (or, in the instant case, different rules andsupporting facts). The contributions of these sets of sentences need tobe combined using the ⊕ operator along the information (≦k) axis. Also,if the expression in equation (2) evaluates to false, then itscontribution to the value of q should be <0,0> (unknown) and not <0,1>(false). These arguments suggest that the closure over φ of q is

$\begin{matrix}{{{{cl}(\varphi)}(q)} = {\underset{U \in S_{q}^{+}}{\oplus}{\bot{\bigvee\left\lbrack {\bigwedge\limits_{p \in U}{{{cl}(\varphi)}(p)}} \right\rbrack}}}} & (3)\end{matrix}$

where τ is <0,0>. This is however, only part of the information. One canalso take into account the sets of sentences entailing

q. Let S_(q) ⁻ be collections of minimal subsets in K entailing

q. Aggregating information from S_(q) ⁻ yields the following expression

$\begin{matrix}{{{{cl}(\varphi)}(q)} = {\underset{U \in S_{q}^{+}}{\oplus}{\bot{\bigvee{\left\lbrack {\bigwedge\limits_{p \in U}{{{cl}(\varphi)}(p)}} \right\rbrack \oplus {{\underset{U \in S_{q}^{-}}{\oplus}{\bot{\bigvee{\left\lbrack {\bigwedge\limits_{p \in U}{{{cl}(\varphi)}(p)}} \right\rbrack.}}}}}}}}}} & (4)\end{matrix}$

Table 1 shows an example, using a simplified logic program, illustratingthe process of computing the closure as defined above by combiningevidence from three sources. In this example, the final uncertaintyvalue computed is 0.4944, 0.72. This indicates that evidence against thehypothesis at (25, 95) at scale 0.9 exceeds evidence in favor of and,depending on the final threshold for detection, this hypothesis islikely to be rejected. Table 1 illustrates an example showing aninference using closure within a ([0,1]², ≦t, ≦k) bilattice.

TABLE 1 Assume the following set of rules and facts: Rules Factsφ(knife(X, Y, S) ← blade(X, Y, S)) - (0.40, 0.60) φ(blade(25, 95,0.9)) - (0.90, 0.10) φ(knife(X, Y, S) ← handle(X, Y, S)) - (0.30, 0.70)φ(handle (25, 95, 0.9)) - (0.70, 0.30) φ( 

 knife (X, Y, S) ← 

 scene_consistent(X, φ( 

 scene_consistent(25, Y, S)) = (0.90, 0.10) 95, 0.9)) = (0.80, 0.20)Inference is performed as follow: el(φ)(knife(25, 95, 0.9)) - (0.0) 

 |(0.4, 0.6) 

 (0.9, 0.1)|⊕(0.0) 

 |(0.3, 0.7) 

 (0.7, 0.3) |⊕

 ((0, 0) 

 |(0.9, 0.1) 

 (0.8, 0.2)|) - (0.36, 0) ⊕ (0.21, 0) ⊕ 

(0.72, 0) - (0.4944, 0) ⊕ (0, 0.72) - (0.4944, 0.72)

In addition to using the explanatory ability of logical rules, one canalso provide these explanations to the user as justification of why thesystem believes that a given hypothesis is a threatening object (e.g., aknife, a gun, etc). The system provides a straightforward technique togenerate proofs from its inference tree. Since all of the bilatticebased reasoning may be encoded as meta-logical rules in a logicprogramming language, predicates can be added that succeed when the rulefires and propagate character strings through the inference tree up tothe root where they are aggregated and displayed. Such proofs can eitherbe dumps of the logic program itself or be English text. In oneimplementation, the output of the logic program is provided as the prooftree.

A knowledge base can then be defined to detect different patterns ofinterest. One may start by defining a number of predicates and theirassociated parameters pertinent to the problem at hand. For instance,for the knife detection problem, atoms such as knife(X,Y,S) (meaningthere exists a knife at location (X,Y) and scale S in the image),blade(X,Y,S), guard(X,Y,S), handle(X,Y,S), etc. can be defined. Alsodefined are relational and geometric predicates such(X₁,Y₁,S₁₉X₂,Y₂,S₂), smaller(X₁, Y₁,S₁,X₂,Y₂,S₂), sceneconsistent(X,Y,S)(meaning the hypothesis at (X,Y) and scale S is consistent with thescene geometry and conforms, within bounds, to the expected size of anobject at the location.)

The next step involves specification of the pattern grammar for definingthe threatening object, as logical rules, over these defined atoms. Suchrules would capture different aspects of the pattern to be recognizedsuch as those shown in FIG. 8, which illustrates a sample subset ofrules for knife detection. Rules in such systems can be learntautomatically; however, such approaches are typically computationallyvery expensive. In one embodiment the rules are manually encoded whileautomatically learning the uncertainties associated with them.

A desirable property of any reasoning framework is scalability. One mayexpect scalability in vision systems as different objects or patternclasses are hierarchically composed of constituent patterns that sharefeatures like textures, edges etc. and as objects inhabit the sameoptical world and are imaged by similar optical sensors. Scalability isseen herein as an aspect of the present invention as a design principlewherein the model description is modular, hierarchical andcompositional, reflecting the above understanding of the world. Theprovided framework results in scalable systems if models areappropriately described as such.

With this goal in mind, the following design principle is provided as anaspect of the present invention for object pattern grammarspecification. The rule specification is partitioned into three broadcategories: object composition model based, object embodiment modelbased and object context model based.

Rules encoding composition models capture a hierarchical representationof the object pattern as a composition of its constituent partdetections. These parts might by themselves be composed of subparts.Rules in this category try to support or refute the presence of apattern based on the presence or absence of its constituent parts.

Embodiment model rules model knowledge about the object pattern'sgeometric layout and their embodiment in 3D projective spaces. Contextmodel rules attempt to model the surrounding context within which thepattern of interest is embedded. These rules would for example modelinteractions between a given object and other objects or other scenestructures.

There typically exist multiple rules that derive the same proposition.These multiple rules are interpreted in logic programming asdisjunctions (i.e. rule 1 is true or rule 2 is true etc). Wilting rulesin this manner makes each rule independently ‘vote’ for the propositionto be inferred. This disjunctive specification results in a scalablesolution where the absence of a single observation does not completelypreempt the final output, but merely reduces its final confidence value.As can be seen from the subset of rules in FIG. 8, the inference treeformed would be comprised of conjunctions, disjunctions and differentkinds of negations.

The pattern grammar for the threatening object detection problem isformulated as per the broad categories listed in the previous section.Component based rules hypothesize that a threatening object is presentat a particular location if one or more of the component part detectorsdescribed above detects a component part there. In other words, if ablade is detected at some location, one may say that there exists aknife there. There are positive rules, one each for the blade, guard,handle as well as negative rules that fire in the absence of thesedetections.

Geometry based rules validate or reject the threatening object (e.g.,knife, gun, etc.) hypotheses based on geometric and scene information.This information is entered a priori in the system at setup time.Information is employed about expected length of knives and regions ofexpected handle locations. The expected image length rule is based onhomography information and domain knowledge. For example, fixing aGaussian at a knife's expected length allows generation of sceneconsistency likelihoods for a particular hypothesis given its locationand size. The expected handle location region is a region demarcated inthe image outside of which no valid handle can occur and thereforeserves to eliminate false positives.

Context based rules may be present for a system that has to handleocclusions. The idea here is that if the system does not detect aparticular threatening object part, then it should be able to explainits absence for the hypothesis to be considered valid. If it fails toexplain a threatening object part, then it is construed as evidenceagainst the hypothesis being the object. Absence of threatening objectparts may be detected using logic programming's ‘negation as failure’operator (not). A valid explanation for a missing threatening objectpart could either be due to occlusions by static objects or due toocclusions by other objects.

Explaining missed detections due to occlusions by static objects isstraightforward. At setup, in one embodiment, all static occlusions aremarked. Image boundaries may also be treated as occlusions and marked.For a given hypothesis, the fraction of overlap of the missingthreatening object part with the static occlusion is computed andreported as the uncertainty of occlusion. The process is similar forocclusions by other threatening object hypotheses, with the onlydifference being that, in addition to the degree of occlusion, we alsotake into account the degree of confidence of the hypothesis that isresponsible for the occlusion, as illustrated in the second rule in FIG.8.

This rule will check to see if knife(X,Y,S)'s guard is occluded byanother knife(Xo,Yo,So) under condition that Yo> Y, meaning the occludedknife is behind the ‘occlude’. It is important to note that this wouldinduce a scene geometry constrained, hierarchy in the parse graph, sincewhether or not a given hypothesis is a knife depends on whether or not ahypothesis in front of it was inferred as being a valid pattern ofinterest. There exist similar rules for other components and also rulesderiving a knife in the absence of explanations for missing parts.

A predicate logic based reasoning framework can be efficientlyimplemented in a logic programming language like Prolog. Distributionsof Prolog like SWI-Prolog, allow for the straightforward integration ofC++ with an embedded Prolog reasoning engine. Predefined rules can beinserted into the Prolog engine's knowledge base at set up time by theC++ module, along with information about scene geometry and otherconstraints. At runtime, the C++ module can apply the detectors on thegiven image, preprocess the feature detector output if needed,syntactically structure this output as logical facts, and finally insertit into the Prolog knowledge base. These detections then serve asinitial hypotheses upon which the query can be performed. Since rulescontain unbounded variables and observed facts contain constants asparameters, querying for a proposition in Prolog implies finding asuitable binding of the rule variables to the constants of thesupporting facts. If no such binding is found, the corresponding ruledoes not fire.

It is important to note that complexity of general inference inpredicate logics can be combinatorial. In practice, however, variableinterdependencies between different atoms of a rule restrict the searchspace significantly. Specifically, in the pattern grammar formulationdescribed herein, there exists significant reuse of the variablesbetween atoms both within and across different rules. Additionally,Prolog can be set up to index facts based on specific variables furtherreducing complexity of variable binding.

FIG. 11 illustrates a method that can be used to perform theabove-described training of a component detector according to anexemplary embodiment of the invention. Referring to FIG. 11, the methodincludes determining uncertainty values for the rules (S1101),converting the rules into a neural network where the uncertainty valuescorrespond to weights of links in the network (S1102), and using aback-propagation modified to allow local gradients over a bilatticespecific inference to optimize the link weights (S1103).

An instantiated inference tree may be cast from the rules as the neuralnetwork (e.g., a knowledge-based neural network). The modifiedback-propagation is a modified back-propagation algorithm to convergeupon a set of rule weights that give optimal performance.

Traditionally, artificial neural networks (ANNs) are modeled as blackboxes. Given a set of input and output variables, and training data, anetwork is created in which the input nodes correspond to the inputvariables and the output nodes correspond to the output variables.Depending on the nature of the problem to be solved and a prioriassumptions, a number of nodes are introduced between the input andoutput nodes that are termed hidden nodes. Each link connecting twonodes is assigned a link weight. Learning in an ANN implies optimizinglink weights to minimize the mean squared error between the networkpredicted output and ground truth, given input data. In such networks,the intermediate hidden nodes don't necessarily have to be meaningfulentities.

In knowledge based ANNs (KBANN), all nodes, hidden or not, have asemantically relevant interpretation. This semantic interpretabilityarises out of careful construction of the KBANN. In an exemplaryembodiment, the KBANN will be constructed from the rules. Each node ofthe KBANN therefore directly corresponds to each instantiated atom ofthe rules while links weights correspond to rules weights. Given therules, optimizing the rule weights is a two step process. Step 1 is touse the rules and facts to create a KBANN and step 2 is to use amodified version of the standard back-propagation algorithm to optimizethe link weights of the KBANN, thus in turn optimizing the rule weightsin the original rules.

The first step in a learning algorithm according to an exemplaryembodiment of the invention is to convert the rules to a representationof a knowledge-based artificial neural network. Consider a set of rules,such as those depicted in FIG. 8. Given a set of training data, in theform of observed logical facts and associated ground truth, the firststep is to generate a grounded, propositional, representation for eachof the rules. Below is one such set of propositional rulerepresentation.

φ(j←o ₁₁ ,o ₁₂ ,o ₁₃)=w _(j1) ⁺

φ(j←o ₂₁ ,o ₂₂ ,o ₁₃)=w _(j2) ⁺

φ(

j←o ₃₁ ,o ₃₂ ,o ₁₃)=w _(j3) ⁻  (5)

where each term, j, o₁₁, o₁₂, etc, represent grounded atoms such asknife(23, 47, 0.4), blade(43, 55, 0.9), etc. The weights associated withthese propositional rules correspond to the evidence_for component ofthe original rules. For a given rule, only the evidence for a componentof the uncertainty attached to the rule is relevant. The evidenceagainst component of the rule weight gets discarded during the inferencedue to the disjunction with <0,0> (see equation (4)). Given aproposition, j, to be reasoned about, positive rules will contributeevidence supporting j, while negative rules will contribute evidencerefuting it. The evidence_for component of the negative rule willcontribute to the evidence_against component of the proposition to bereasoned about due to the negation (refer to the example in Table 1 formore details). This grounded, propositional, rules representation cannow be directly used to construct the artificial neural network. In sucha network, observed features (logical facts) become the input nodes,while propositions corresponding to the rule heads become output nodesand are placed at the top of the network. Rule weights become linkweights in the network.

FIG. 9 shows the KBANN derived from the set of grounded, propositionalrules from (5). Conjuncts within a single rule may need to first passthrough a conjunction node before reaching the consequent node wherealong with the weights they would get combined with contributions fromother rules in a disjunction. In FIG. 9, the links connecting theconjuncts to the product node are depicted using solid lines. Thisindicates that this weight is unadjustable and is always set to unity.Only the weights corresponding to the links depicted in dotted lines areadjustable as they correspond to the rule weights.

Consider a simple ANN as shown in FIG. 10. In traditional backpropagation, the output of an output node is:

$\begin{matrix}{d_{j} = {{\sigma \left( z_{j} \right)} = {\frac{2}{1 + ^{- {\lambda {({zj})}}}} - 1}}} & (6)\end{matrix}$

where σ is the sigmoid function and where

$\begin{matrix}{z_{j} = {{\varphi (j)} = {\sum\limits_{i}{w_{ji}{\sigma \left( {\varphi \left( o_{i} \right)} \right)}}}}} & (7)\end{matrix}$

The error at the output node is

$\begin{matrix}{E = {\frac{1}{2}{\sum\limits_{j}\left( {t_{j} - d_{j}} \right)^{2}}}} & (8)\end{matrix}$

where t_(j) is the ground truth for node j. Based on this measure oferror, the change of a particular link weight is set to be proportionalto the rate of change of error with respect to that link weight. Thus

$\begin{matrix}{{\Delta \; w_{ji}} \propto {- \frac{\partial E}{\partial w_{ji}}}} & (9)\end{matrix}$

Using standard back-propagation calculus, the change in link weight canbe computed to be

$\begin{matrix}{{{\Delta \; w_{ji}} = {{\eta\delta}_{j}{\sigma \left( {\varphi \left( o_{j} \right)} \right)}}}{where}} & (10) \\{\delta_{j} = {\left( {t_{j} - d_{j}} \right)\frac{\partial{\sigma \left( z_{j} \right)}}{\partial z_{j}}}} & (11)\end{matrix}$

if j is an output node and

$\begin{matrix}{\delta_{j} = {\frac{\partial{\sigma \left( z_{j} \right)}}{\partial z_{j}}{\sum\limits_{k \in {{DS}{(j)}}}{\delta_{k}w_{lj}}}}} & (12)\end{matrix}$

if j is a non-output node, where DS(j) is the set of nodes downstreamfrom j.

These equations need to be extended to the KBANN depicted in FIG. 9.This involves computing gradients over the bilattice specific inferenceoperation. Recall that in the bilattice based logical reasoningapproach, inference is performed by computing the closure over a logicprogram using (4). This equation can be simplified as

$\begin{matrix}{z_{j} = {{\varphi (j)} = {\overset{+ {ve}}{\underset{i}{\oplus}}{{w_{ji}^{+}\bigwedge\left\lbrack {\bigwedge\limits_{i}{\varphi \left( o_{il} \right)}} \right\rbrack} \oplus {{\overset{- {ve}}{\underset{i}{\oplus}}{w_{ji}^{-}\bigwedge\left\lbrack {\bigwedge\limits_{t}{\varphi \left( o_{il} \right)}} \right\rbrack}}}}}}} & (13)\end{matrix}$

Note that this equation represents a general form of the closureoperation before a commitment has been made on the underlying latticestructure and its corresponding glb and lub operators. Once the choiceof the underlying lattice and corresponding operators has been made, inconjunction with equations (8), (9) and (13), it should be possible tocompute the rate of change of each of the rule weights.

Consistent with earlier description, the underlying lattice will bechosen to be L=([0,1],≦) and the t-norm will be chosen to be J (a,b)=a

_(L)b=ab and t-conorm as S(a,b)=aV_(L)b=a+b−ab. As defined earlier, theglb and lub operators for each axis of the bilattice B can then bedefined as per equation (1). Plugging these operator instantiations inequation (13), it can be further simplified to

$\begin{matrix}{z_{j} = {{\underset{i}{\overset{+ {ve}}{\uplus}}{w_{ji}^{+}{\prod\limits_{l}\; {\varphi \left( o_{il} \right)}}}} - {\underset{i}{\overset{+ {ve}}{\uplus}}{w_{ji}^{-}{\prod\limits_{l}\; {\varphi \left( o_{il} \right)}}}}}} & (14)\end{matrix}$

where a

b=+b−ab.

Note that, unlike the traditional output equation for back propagationequation (7), this formulation is slightly more complex due to thecombination of observation nodes via the conjunction (product) node andthen further combination of outputs of multiple rules via disjunction(probabilistic sum). The probabilistic sum of weights, can be easilydifferentiated, with respect to given weight w_(k) as follows:

$\begin{matrix}{{\uplus_{i}w_{i}},{\frac{\partial{\uplus_{i}w_{i}}}{\partial w_{k}} = {1 - {\underset{i \neq k}{\uplus}w_{i}}}}} & (15)\end{matrix}$

Using equation (14) and (15), the gradients can be computed to be

$\begin{matrix}{\frac{\partial E}{\partial w_{ji}^{+}} = {- {\left( {t_{j} - d_{j}} \right)\left\lbrack {\prod\limits_{i}\; {{\varphi \left( o_{il} \right)}\left\lbrack \left\lbrack {1 - {\underset{m \neq i}{\uplus}{w_{jm}^{+}{\prod\limits_{l}\; {{\varphi \left( o_{ml} \right)}\lbrack}}}}} \right. \right.}} \right.}}} & (16) \\{\frac{\partial E}{\partial w_{ji}^{+}} = {\left( {t_{j} - d_{j}} \right)\left\lbrack {\prod\limits_{i}\; {{\varphi \left( o_{il} \right)}\left\lbrack \left\lbrack {1 - {\underset{m \neq i}{\uplus}{w_{jm}^{-}{\prod\limits_{l}\; {{\varphi \left( o_{ml} \right)}\lbrack}}}}} \right. \right.}} \right.}} & (17)\end{matrix}$

The rate of change of each rule weight can be computed as follows:

$\begin{matrix}{{\Delta \; w_{ji}^{+}} = {{{\eta\delta}_{j}\left\lbrack {\prod\limits_{t}\; {\varphi \left( o_{il} \right)}} \right\rbrack}\left\lbrack {1 - {\underset{k \neq m}{\uplus}{w_{jm}^{+}{\prod\limits_{l}\; {{\varphi \left( o_{ml} \right)}\lbrack}}}}} \right.}} & (18) \\{{\Delta \; w_{ji}^{-}} = {- {{{\eta\delta}_{j}\left\lbrack {\prod\limits_{t}\; {\varphi \left( o_{il} \right)}} \right\rbrack}\left\lbrack {1 - {\underset{k \neq m}{\uplus}{w_{jm}^{-}{\prod\limits_{l}\; {{\varphi \left( o_{ml} \right)}\left\lbrack {where} \right.}}}}} \right.}}} & (18) \\{\delta_{j} = {t_{j} - d_{j}}} & (19)\end{matrix}$

if j is an output node and

$\begin{matrix}{\delta_{j} = {\sum\limits_{m \in {{DS}{(j)}}}^{\;}{\delta_{m}w_{inj}{\prod\limits_{i \neq j}\; {{\varphi \left( o_{jl} \right)}\left\lbrack {1 - {\underset{k \neq j}{\uplus}{w_{mk}{\prod\limits_{l}\; {{\varphi \left( o_{kl} \right)}\lbrack}}}}} \right.}}}}} & (20)\end{matrix}$

if j is a non-output node, where DS(j) is the set of nodes downstreamfrom j.

Once analytically computed the gradient has been analytically computedthere are a number of techniques to perform the actual optimization. Inone embodiment it has been elected to perform online weight update,where for each data point the gradient is computed and used toinstantaneously modify the rule weight. This is in contrast to a batchapproach where the cumulative gradient of a batch of data points is usedto update the weights. It is believed that an online approach such asthe one adopted is better suited for applications with limited access toannotated data.

The reasoning system provides not only a powerful means to detectcomplex objects, but it can also provide an explanation of why an objectis inferred to be a threat. This information can be directly taken fromthe “parse tree” of the inference. For example, for the pistol threat itcan provide the instantiated features of lock, stock, and barrelfeatures, and the predicates that describe how those are connected,combined or put into context. This is valuable information that canenhance a user interface with the human in the loop. A visualinterpretation of the parse tree can be presented directly in the image,highlighting the compositional and contextual relation between thelow-level components. For example, if a weapon is disassembled, on apress of a button, it can be shown in a configuration suggested by theinstantiated set of rules, etc. The detected and segmented parts of adisassembled pistol, explosive threat, etc., can also be animated in a“virtual assembly” for visual verification of the human operator. Thisprovides the user intuitive suggestions to interpret the results fromthe automated processing.

A further exemplary embodiment of the invention attempts to bridge thegap between the need of designing a robust, automated, videosurveillance system, and the capabilities of current low level videoanalytics modules. This embodiment has the potential to greatly advanceintelligence, surveillance and reconnaissance capabilities of anautomated surveillance system, thus in turn enhancing homeland security.

Reliably extracting patterns of human or vehicular activities in videocan be difficult. Complex visual patterns tend to be compositional andhierarchical. For example, an image of a human can be thought to becomposed of a head, torso and limbs. The image of the head is composedof hair and a face. The image of the face is composed of eyes, a nose, amouth, etc. Such visual patterns tend to be challenging to detect,robustly as a whole, due to high degree of variability in shape,appearance, occlusions, articulation, and image noise among otherfactors.

An exemplary embodiment of the invention employs automated videosurveillance modules that are based on advanced symbolic reasoning thatsits on top of the current computer vision technologies. Knowledge ofthe patterns of human, vehicular, or boat activity is represented in ahierarchical, compositional manner to exploit this knowledge, inconjunction with the output of low level image level features, toeffectively search for the presence of the patterns of interest invideo.

In an exemplary embodiment of the invention, a first order predicatelogic based reasoning framework is mated with the probabilistic outputof current image analytics modules. The system described with respect toFIGS. 1-11 can be adapted to detect activities from patterns of interestin video. For example, assume the activity of illegal drug traffickingis associated with video clips of a weapon, bagged items, and certaintype of boat. A pattern grammar describing this illegal drug traffickingcan then be encoded using a plurality of first order logic basedpredicate rules. For example, the rules could indicate that drugtrafficking is present when the weapon and the bagged items are acertain distance away from the boat. Then a component detector can betrained to identify each component of the drug trafficking, for example,one for detecting the weapon, one for detecting the bagged items, andanother for detecting the boat. Then video data can be processed withthe component detectors to identify at least one of the components ofthe drug trafficking, and the rules can be executed to determine whetheror not drug trafficking is present. The training of these componentdetectors can be performed in a manner that is similar to that describedabove for the components detectors that detect the threatening object.

In addition to activity recognition, such as a reasoning framework canbe applied for visual surveillance problems such as detecting of complexobjects in aerial imagery, human detection, and identity maintenance.

Robust low level video analytics modules can be designed for tasks likemoving object detection, object tracking, human posture and viewpointestimation, detection of humans carrying packages, and analysis ofvehicle trajectories. These modules provide atomic primitives that willserve as input to the high level reasoning, which will then be used todetect complex combinations of human and vehicular activities forsecurity and safety use cases.

The system may facilitate the detection of complex compositional eventsspread out over time and across multiple cameras.

In an exemplary embodiment of the invention, these modules areintegrated with the Siemens' Siveillance™ surveillance platform todevelop an end-to-end proof-of-concept system. When combined withSiemens' Siveillance™ video surveillance platforms, such a symbolicreasoning-based human and vehicular activity recognition approach willprovide a robust solution to automated visual surveillance, making itpossible to rapidly search for interesting activities in stored video oridentify activities in real-time. Thus, such a system will greatlyenhance situational awareness by providing proactive and predictivecapabilities, thus providing advanced Intelligence, Surveillance andReconnaissance capabilities at any site where it is deployed.

FIG. 12 shows an example of a computer system, which may implement amethods and systems of the present disclosure. The system and methods ofthe present disclosure, or part of the system and methods, may beimplemented in the form of a software application running on a computersystem, for example, a mainframe, personal computer (PC), handheldcomputer, server, etc. For example, the method of FIGS. 1 and 11 may beimplemented as software application(s). These software applications maybe stored on a computer readable media (such as hard disk drive memory1008) locally accessible by the computer system and accessible via ahard wired or wireless connection to a network, for example, a localarea network, or the Internet.

The computer system referred to generally as system 1000 may include,for example, a central processing unit (CPU) 1001, a GPU (not shown), arandom access memory (RAM) 1004, a printer interface 1010, a displayunit 1011, a local area network (LAN) data transmission controller 1005,a LAN interface 1006, a network controller 1003, an internal bus 1002,and one or more input devices 1009, for example, a keyboard, mouse etc.As shown, the system 1000 may be connected to a data storage device, forexample, a hard disk, 1008 via a link 1007. CPU 1001 may be the computerprocessor that performs some or all of the steps of the methodsdescribed above with reference to FIGS. 1-12.

Although the illustrative embodiments have been described herein withreference to the accompanying drawings, it is to be understood that thepresent invention is not limited to those precise embodiments, and thatvarious other changes and modifications may be affected therein by oneof ordinary skill in the related art without departing from the scope orspirit of the invention. All such changes and modifications are intendedto be included within the scope of the invention.

1. A method of detecting an object in image data that is deemed to be a threat, the method comprising: annotating sections of at least one training image to indicate whether each section is a component of the object; encoding a pattern grammar describing the object using a plurality of first order logic based predicate rules; training distinct component detectors to each identify a corresponding one of the components based on the annotated training images; processing image data with the component detectors to identify at least one of the components; and executing the rules to detect the object based on the identified components.
 2. The method of claim 1, further comprising: generating an explanation that explains why the object is considered a threat from a parse tree of the rules; and presenting the explanation graphically to a user.
 3. The method of claim 2, wherein the explanation includes features in the object that were discovered by at least one of the component detectors.
 4. The method of claim 3, wherein the explanation describes how those features are connected to one another.
 5. The method of claim 2, wherein the explanation includes a graphical virtual assembly of the threatening object from the component parts.
 6. The method of claim 1, wherein the training of the component detectors is performed using Adaptive Boosting.
 7. The method of claim 1, wherein the pattern grammar is implemented as instructions in a processor, and executing of the rules is performed by the processor executing the instructions.
 8. The method of claim 1, wherein the object is a knife and the annotated sections indicate whether each component is one of a handle, a guard, and a blade of the knife.
 9. The method of claim 1, wherein the object is a gun and the annotated sections indicate whether each component is one of a lock, a stock, and a barrel of the gun.
 10. The method of claim 1, wherein the object is a detonator and the annotated sections indicate whether each component is one of a tube and an explosive material.
 11. The method of claim 1, wherein the object is a bomb and the annotated sections indicate whether each component is one of a detonator, explosive material, a cable, and a battery.
 12. The method of claim 1, wherein the image data is X-ray image data.
 13. The method of claim 11, wherein the image data is computed tomography (CT) data.
 14. The method of claim 1, wherein the training comprises: determining uncertainty values for each of the rules; converting the rules into a knowledge-based artificial neural network, where each uncertainty value corresponds to a weight of a link in the neural network; and using a back-propagation algorithm modified to allow local gradients over a bilattice specific inference operation to optimize the link weights.
 15. The method of claim 1, wherein the pattern grammar describes a visual pattern of the object by encoding knowledge about contextual clues, scene geometry, and visual pattern constraints.
 16. The method of claim 1, wherein the training of a corresponding one of the component detectors comprises: performing a physics-based perturbation on one of the annotated training images to generate a new annotated training image; and training the distinct component detectors based on the annotated training images and the new annotated training image.
 17. The method of claim 16, wherein the perturbation is a geometric transformation.
 18. The method of claim 1, wherein the performing of the perturbation comprises adding another object to be superimposed with a component in the training image to generate the new annotated training image.
 19. A computer readable storage medium embodying instructions executable by a processor to perform method steps for detecting an object in image data that is deemed to be a threat, the method steps comprising instructions for: annotating sections of at least one training image to indicate whether each section is a component of the object; encoding a pattern grammar describing the object using a plurality of first order logic based predicate rules; training distinct component detectors to each identify a corresponding one of the components based on the annotated training images; processing image data with the component detectors to identify at least one of the components; and executing the rules to detect the object based on the identified components.
 20. A method of training a threat detector to detect an object in image data that is deemed to be a threat, the method comprising: defining a pattern grammar to describe a visual pattern that is representative of the object; encoding the pattern grammar using a plurality of first order predicate based logic rules; and dividing an object into component parts; training distinct component detectors to each detect a corresponding one of the component parts; and generating the threat detector from the rules.
 21. The method of claim 20, wherein the pattern grammar is implemented as instructions in a processor.
 22. The method of claim 19, wherein the training comprises: determining uncertainty values for each of the rules; converting the rules into a knowledge-based artificial neural network, where each uncertainty value corresponds to a weight of a link in the neural network; and using a back-propagation algorithm modified to allow local gradients over a bilattice specific inference operation to optimize the link weights.
 23. A method of detecting an activity in video data, the method comprising: annotating sections of at least one training video to indicate whether each section is a component of the activity; encoding a pattern grammar describing the object using a plurality of first order logic based predicate rules; training distinct component detectors to each identify a corresponding one of the components based on the annotated training videos; processing video data with the component detectors to identify at least one of the components; and executing the rules to detect the activity based on the identified components. 